Read PDFs Aloud with AI Voice (Text-to-Speech)

Annolid includes an integrated PDF viewer with text-to-speech (TTS), so you can open a PDF and have selected text (or whole paragraphs) read aloud.

Prerequisites

  1. Install Annolid (see install.md).

  2. Install PDF support:

pip install pymupdf
  1. Install a TTS backend (pick one):

  • Recommended (offline, higher-quality “AI voice”): Kokoro (ONNX)

pip install kokoro-onnx onnxruntime gdown
  • Pocket TTS (very lightweight, CPU-only runtime, voices such as alba, marius, javert, jean, fantine, cosette, eponine, and azelma)

pip install pocket-tts

Set Engine = Pocket, choose one of the built-in voices, or type a custom voice ID / prompt path. If you have a short WAV prompt of the desired voice, specify it in the “Pocket prompt” field to clone that tone. Use the Pocket speed control to speed up or slow down the generated speech (0.5–2.0×). (Optional) You can also install via pip install annolid[pocket_tts] so the dependency is available automatically.

  • Voice cloning (offline, uses a short voice prompt): Chatterbox Turbo (ONNX)

pip install onnxruntime soundfile

Then select Engine = Chatterbox and choose a voice prompt audio file in the PDF Speech dock (or edit ~/.annolid/tts_settings.json).

  • Language packs for Kokoro when you want Chinese or Japanese voices:

pip install misaki[zh]  # enables Mandarin (e.g., voice zf_001)
pip install misaki[ja]  # enables Japanese (e.g., voice jf_alpha)
  • Fallback (online, simpler): Google TTS

pip install gTTS pydub

pydub needs ffmpeg available on your system.

Open a PDF in Annolid

  1. Launch the GUI:

annolid
  1. Go to FileOpen PDF... and pick a .pdf.

Annolid switches into PDF view and shows these docks (typically on the right):

  • PDF Speech (voice / language / speed)

  • PDF Controls (page + zoom)

  • PDF Reader (click-to-read mode)

Option A: Speak a selection (fastest)

This works in both the fallback viewer (image + text panel) and the PDF.js viewer.

  1. Select some text (either in the page text panel, or directly on the PDF page).

  2. Right-click → Speak selection.

Option B: Click-to-read paragraphs (PDF.js reader mode)

This reads full paragraphs/sentences starting from where you click.

  1. In the PDF Reader dock, enable Use PDF.js (required for reader).

  2. Keep Enable click-to-read turned on.

  3. Click a paragraph in the PDF page to start reading.

  4. Use Pause/Resume, Stop, Prev, Next in the same dock.

If the reader says it’s unavailable, install QtWebEngine (pyqtwebengine in conda, or PyQtWebEngine via pip) and restart Annolid.

Change voice, language, and speed

Use the PDF Speech dock to set:

  • Voice (example: af_sarah)

  • Voice (Chinese): zf_001 (requires misaki[zh])

  • Voice (Japanese): jf_alpha (requires misaki[ja])

  • Language (example: en-us)

  • Speed (0.5–2.0)

These settings persist in ~/.annolid/tts_settings.json.

Troubleshooting

  • “PyMuPDF Required” dialog: run pip install pymupdf.

  • No audio output:

    • Make sure ANNOLID_DISABLE_AUDIO is not set.

    • On Linux servers/containers, ensure an audio device is present (or use a desktop machine).

  • First Kokoro run is slow: Annolid downloads model files into ~/.annolid/kokoro the first time.

  • gTTS fails: it requires internet access; also ensure ffmpeg is installed for pydub.